Historically, hundreds of thousands of chum salmon (Oncorhynchus keta) adults returned annually to the Lower Columbia River (LCR). However, much like other salmon populations throughout the Pacific Northwest, the overall abundance of LCR chum has declined precipitously as a result of anthropogenic impacts and climatic changes. In 1999, Columbia River chum salmon were listed as Threatened under the Endangered Species Act (ESA), prompting state and federal agencies to begin recovery efforts. Specifically, the Washington Department of Fish and Wildlife (WDFW) began implementing a chum recovery program that uses a three-pronged approach, which includes habitat restoration, hatchery supplementation, and monitoring and evaluation. Each of these prongs has been implemented for 15–20 years, but the resulting data sets have not previously been integrated to understand the patterns and drivers of temporal and spatial variability in chum abundance and productivity.
In 2016, WDFW proposed the development of a life-cycle model (LCM) to improve our understanding of LCR chum population dynamics and to provide a platform for future analyses that could: (1) help identify limiting life stages through time and space, (2) help identify criteria to assess the need for supplementation, and (3) help prioritize habitat restoration by identifying the features of viable chum habitat patches.
Here we describe a stage-structured, multi-population LCM for LCR chum formulated as an integrated population model (IPM), in which a joint likelihood function allows the model to be fitted to all available data at once (Schaub and Abadi 2011). IPMs are the state-of-the-art framework for parameterizing structured population models and have been used widely in fisheries and wildlife ecology because they offer numerous advantages over an ad hoc, piecemeal approach, especially when data are sparse or heterogeneous (Newman et al. 2006, Su and Peterman 2012, Fleischman et al. 2013, Maunder and Punt 2013). The heart of an IPM is a state-space model, a form of hierarchical model that distinguishes true ecological process stochasticity from observation noise (de Valpine and Hilborn 2005).
Our LCR chum IPM is developed as part of the R package salmonIPM (Buhle et al. 2018), which facilitates fitting a variety of age- and stage-structured IPMs tailored to anadromous salmonid life histories. This platform allows a reproducible workflow from data to model-fitting and management scenario analysis, which can be easily updated as new data become available. Models in salmonIPM are implemented in Stan (Stan Development Team 2019), a probabilistic programming language for Bayesian estimation that uses the cutting-edge Hamiltonian Monte Carlo (HMC) algorithm to draw samples from the posterior distribution (Monnahan et al. 2017).
This vignette first details the structure of the LCR chum IPM, including the process model (the true underlying population dynamics), the observation model (the likelihood of data arising from field monitoring programs given the unknown true state), and the priors. We then demonstrate how to fit LCR chum models using salmonIPM and discuss and interpret the main results, including reporting metrics of interest to managers. We conclude by outlining our plans for further model development. The goal of the vignette is to provide an introduction to both the mathematical structure of the model and the workflow involved in fitting it in R and working with the output. To that end, we display a minimal amount of R code, but all the code used in the examples shown here can be found in the RMarkdown source used to generate this document.
The chum life cycle begins with spawning, egg incubation and fry emergence, and shortly thereafter the downstream migration of juveniles (which we refer to as smolts). In salmonIPM we can fit three alternative spawner-recruit functions to describe the expected relationship between spawner abundance \(S_{jt}\) and smolt abundance \(M_{jt}\) in brood year \(t\) in population \(j\): density-independent discrete exponential, Beverton-Holt, and Ricker:
\[ f \left( S_{jt} | \alpha_{jt}, M_{\text{max},j}, A_{jt} \right) = \begin{cases} \alpha_{jt} S_{jt} & \text{exponential} \\ \dfrac{ \alpha_{jt} S_{jt} }{ 1 + \dfrac{ \alpha_{jt} S_{jt} }{ A_{jt} M_{\text{max},j} }} & \text{Beverton-Holt} \\ \alpha_{jt} S_{jt} \text{exp}\left(- \dfrac{ \alpha_{jt}S_{jt} }{ \text{exp}(1) A_{jt} M_{\text{max},j} } \right) & \text{Ricker} \end{cases} \]
We use a nonstandard parameterization of the Ricker in terms of maximum smolt production or “capacity” \(M_{\text{max}}\), corresponding to the mode of the function. This facilitates direct comparison with the Beverton-Holt (where \(M_{\text{max}}\) is the asymptote) and is better-identified by data than the standard parameterization based on per capita density dependence. \(M_{\text{max}}\) has units of density (fish per stream distance or area) and is then expanded to units of abundance based on habitat size \(A_{jt}\). In the case of LCR chum, this is km of spawning habitat estimated using GIS.
Intrinsic productivity \(\alpha_{jt}\) is calculated as a weighted mean of age-specific female fecundity \(\mu_{E,a}\), weighted by the spawner age distribution \(q_{jta}\), multiplied by the proportion of female spawners \(q_{\text{F},jt}\) (discounted for the proportion of females that are not “green”, i.e. not fully fecund, with discount rate \(\delta_\text{NG} \in [0,1]\)), and finally multiplied by the density-independent maximum egg-to-smolt survival \(\psi_j\):
\[ \alpha_{jt} = \psi_j q_{F,jt} \left[p_{\text{G},jt} + \delta_\text{NG} \left(1 - p_{\text{G},jt} \right) \right] \sum_{a=3}^{5} q_{jta} \mu_{E,a}. \]
The discount for partially spawned females is only relevant for one population, Duncan Channel, a constructed spawning channel in which some translocated females are believed to have already deposited eggs elsewhere. We assume the proportion of “green” females \(p_{\text{G},jt}\) is known without error since translocated spawners are individually handled and visually assessed.
This formulation implicitly assumes egg deposition is density-independent while egg-to-smolt survival is density-dependent, resulting in realized survival \(< \psi_j\) as spawner density increases. Maximum egg-to-smolt survival varies randomly among populations according to the hyperdistribution
\[ \text{logit}(\psi_j) \sim N(\mu_\psi, \sigma_\psi). \]
Maximum smolt production varies randomly among populations according to the hyperdistribution
\[ \text{log}(M_{\text{max},j}) \sim N(\mu_{M_\text{max}}, \sigma_{M_\text{max}}). \]
Given the expected smolt production, realized smolt production (the unknown “true state”) is lognormally distributed with process errors that include a common ESU-level autoregressive trend \(\eta^\text{year}_{M,t}\) with first-order autocorrelation coefficient \(\rho_{M}\) and innovation SD \(\sigma^\text{year}_{M}\), plus unique independent shocks \(\epsilon_{M,jt}\):
\[ \begin{aligned} M_{jt} &= f \left( S_{jt} | \alpha_{jt}, M_{\text{max},j}, A_{jt} \right) \, \text{exp}( \eta^\text{year}_{M,t} + \epsilon_{M,jt} ) \\ \eta^\text{year}_{M,t} &\sim N(\rho_{M} \eta^\text{year}_{M,t-1}, \sigma^\text{year}_{M}) \\ \epsilon_{M,jt} &\sim N(0, \sigma_{M}). \end{aligned} \]
In salmonIPM we can include covariates of the parameters \(\psi_j\) and \(M_{\text{max},j}\) as well as the smolt recruitment process error term. We do not consider this option further here, but we anticipate incorporating environmental covariates in future development of the LCR chum IPM.
Smolt-to-adult survival (SAR, \(s_{MS}\)) is assumed to be density-independent. The SAR process model for outmigrant cohort \(t\) in population \(j\) is logistic normal, including a common ESU-level first-order autoregressive trend \(\eta^\text{year}_{MS,t}\) with first-order autocorrelation coefficient \(\rho_{MS}\) and innovation SD \(\sigma^\text{year}_{MS}\), plus unique independent shocks \(\epsilon_{MS,jt}\):
\[ \begin{aligned} \text{logit}( s_{MS,jt} ) &= \text{logit}( \mu_{MS} ) + \eta^\text{year}_{MS,t} + \epsilon_{MS,jt} \\ \eta^\text{year}_{MS,t} &\sim N(\rho_{MS} \eta^\text{year}_{MS,t-1}, \sigma^\text{year}_{MS}) \\ \epsilon_{MS,jt} &\sim N(0, \sigma_{MS}). \end{aligned} \] As with the smolt recruitment parameters, salmonIPM can accommodate spatiotemporally varying covariates of \(s_{MS}\), but we do not discuss this further here.
Adult age structure is modeled by defining a vector of conditional probabilities, \(\mathbf{p}_{jt} = [p_{3jt}, p_{4jt}, p_{5jt}] ^ \top\), where \(p_{ajt}\) is the probability that an outmigrant in year \(t\) in population \(j\) returns at age \(a\), given that it survives to adulthood. The unconditional probability is given by \(s_{MS,jt} p_{ajt}\), where both SAR and \(p_a\) are functions of underlying annual marine survival and maturation probabilities that are nonidentifiable without some ancillary data. This parameterization resolves the nonidentifiability.
The conditional age probabilities follow a logistic normal process model with hierarchical structure across populations and through time within each population. The additive log ratio,
\[ \text{alr}(\mathbf{p_{jt}}) = \left[ \text{log} \left( \dfrac{p_{3jt}}{p_{5jt}} \right), \text{log} \left( \dfrac{p_{4jt}}{p_{5jt}} \right) \right] ^ \top \]
has a hierarchical bivariate normal distribution with population-level random effects \(\boldsymbol{\eta}^\text{pop}_{\mathbf{p}, j}\) and unique residuals \(\boldsymbol{\epsilon}_{\mathbf{p}, jt}\):
\[ \begin{aligned} \text{alr}(\mathbf{p_{jt}}) &= \text{alr}(\boldsymbol{\mu}_\mathbf{p}) + \boldsymbol{\eta}^\text{pop}_{\mathbf{p}, j} + \boldsymbol{\epsilon}_{\mathbf{p}, jt} \\ \boldsymbol{\eta}^\text{pop}_{\mathbf{p}, j} &\sim N(\mathbf{0}, \boldsymbol{\Sigma}^\text{pop}_\mathbf{p}) \\ \boldsymbol{\epsilon}_{\mathbf{p}, jt} &\sim N(\mathbf{0}, \boldsymbol{\Sigma}_\mathbf{p}). \end{aligned} \]
Here the 2 \(\times\) 2 covariance matrices \(\boldsymbol{\Sigma}^\text{pop}_\mathbf{p}\) and \(\boldsymbol{\Sigma}_\mathbf{p}\) allow correlated variation among age classes (on the unconstrained scale, not merely due to the simplex constraint on \(\mathbf{p}\)) across populations and through time within a population, respectively. For example, some populations or cohorts may skew overall younger or older than average. We parameterize each covariance matrix by a vector of standard deviations and a correlation matrix:
\[ \begin{aligned} \boldsymbol{\Sigma}^\text{pop}_\mathbf{p} &= \boldsymbol{\sigma}^\text{pop}_\mathbf{p} \mathbf{R}_\mathbf{p}^\text{pop} { \boldsymbol{\sigma}^\text{pop}_\mathbf{p} } ^ \top \\ \boldsymbol{\Sigma}_\mathbf{p} &= \boldsymbol{\sigma}_\mathbf{p} \mathbf{R}_\mathbf{p} \boldsymbol{\sigma}_\mathbf{p} ^ \top . \end{aligned} \]
Adult sex structure is modeled as the conditional probability \(p_{\text{F},jt}\) that an outmigrant from population \(j\) in year \(t\) is female, given that it survives to adulthood. The proportion of females follows a process model that includes normally distributed population-specific random effects \(\eta^\text{pop}_{\text{F}}\) with hyper-SD \(\sigma^\text{pop}_\text{F}\) and unique residuals \(\epsilon^\text{pop}_{\text{F}}\) with SD \(\sigma_\text{F}\) around the hyper-mean \(\mu_\text{F}\).
\[ \begin{aligned} \text{logit}( p_{\text{F},jt} ) &= \text{logit}( \mu_\text{F} ) + \eta^\text{pop}_{\text{F},j} + \epsilon_{\text{F},jt} \\ \eta^\text{pop}_{\text{F},j} &\sim N(0, \sigma^\text{pop}_\text{F}) \\ \epsilon_{\text{F},jt} &\sim N(0, \sigma_\text{F}) \end{aligned} \]
Recruitment of wild-origin adult spawners \(S_\text{W}\) is calculated by summing over the corresponding outmigrant cohorts and subtracting the number of spawners removed for hatchery broodstock or to spawning channels (\(B_{jt}\), assumed known without error). Fishery mortality of LCR chum is thought to be minimal.
\[ S_{\text{W}, jt} = \left(\sum_{a=3}^{5} s_{MS,j,t-a} \hspace{0.1cm} p_{aj,t-a} \hspace{0.1cm} M_{j,t-a} \right) - B_{jt} = \left(\sum_{a=3}^{5} \tilde{S}_{\text{W}, ajt} \right) - B_{jt}. \]
The proportion of naturally spawning hatchery-origin adults is modeled as an independent parameter in each population and year when hatchery programs were operational in the basin or when observed origin-frequency data indicate hatchery spawners were present. That is, we do not include a process model linking \(p_{\text{HOS},jt}\) to hatchery releases and returns, although this is planned as a component of future model development. Abundance of hatchery-origin spawners is then calculated as
\[ S_{\text{H},jt} = \dfrac{ S_{\text{W},jt} \hspace{0.1cm} p_{\text{HOS},jt} } { (1 - p_{\text{HOS},jt}) }. \]
Total spawner abundance is then
\[ S_{jt} = S_{\text{W},jt} + S_{\text{H},jt} + S^\text{add}_{jt} \] where \(S^\text{add}_{jt}\) is the number of spawners that returned to populations \(i \neq j\) and were translocated into population \(j\). This is only nonzero in the case of Duncan Channel.
Spawner age structure is given by \(\mathbf{q}_{jt} = [q_{3jt}, q_{4jt}, q_{5jt}]\), where \(q_{ajt} = \tilde{S}_{\text{W},ajt} / \sum_a{\tilde{S}_{\text{W},ajt}}\). Spawner sex structure is given by the age-weighted average of female proportions from the respective outmigrant cohorts: \(q_{\text{F},jt} = \sum_{a} {q_{ajt} \hspace{0.1cm} p_{\text{F},j,t-a}}\).
We modeled observations of fecundity from individual female chum salmon collected at hatcheries. The observation likelihood for the fecundity of female \(i\) of age \(a\) is a zero-truncated normal with age-specific mean and SD:
\[ E_{a,i}^\text{obs} \sim N_+(\mu_{E,a}, \sigma_{E,a}). \]
Empirical estimates of smolt and spawner abundance come from monitoring programs that use various sampling methods (e.g., rotary screw traps for outmigrants and weirs or mark-recapture for adults) and statistical models (e.g., the BTSPAS R package or other Bayesian time-series models) to produce a posterior distribution of the abundance of a given life-history stage in each population and year. These methods have previously been applied and their output summarized by the posterior mean and SD. We used the posterior moments of abundance to compute the log-mean \(\mu_{K,jt}\) and log-SD \(\tau_{K,jt}\) for each life stage \(K\) assuming the posterior is lognormal, which appears to be reasonable in general. We then used the observation-model posteriors as informative priors on the corresponding state variables in the IPM.
One complication is that outmigrant monitoring sites do not always have a one-to-one correspondence with independent populations. Specifically, the smolt trap in the mainstem Grays River intercepts juveniles produced in the mainstem population itself as well as the upstream West Fork Grays River and Crazy Johnson Creek populations, the latter of which is also sampled with a dedicated trap. To make the states comparable to the observations, we defined a derived state \(\tilde{M}_{jt}\) that is the sum of smolts produced in population \(j\) and all upstream populations, assuming no mortality.
\[ \begin{aligned} \text{log}(\tilde{M}_{jt}) &\sim N(\mu_{\tilde{M},jt}, \tau_{\tilde{M},jt}) \\ \text{log}(S_{jt}) &\sim N(\mu_{S,jt}, \tau_{S,jt}). \end{aligned} \]
The IPM framework naturally handles missing observations, but in some cases an estimate of the observation prior log-mean \(\mu_{K,jt}\) was available while the observation error log-SD \(\tau_{K,jt}\) was missing or unknown. These missing values were imputed within the IPM by fitting a lognormal hyperdistribution to the known log-SDs:
\[ \begin{aligned} \text{log}(\tau_{M,ij}) &\sim N( \mu_{\tau_M}, \sigma_{\tau_M}) \\ \text{log}(\tau_{S,ij}) &\sim N( \mu_{\tau_S}, \sigma_{\tau_S}). \end{aligned} \]
Spawner age-composition data are collected from scale samples in each population and year (the IPM framework naturally handles cases in which no age samples are available). Age frequencies of wild spawners \(\mathbf{n}_{ajt}^\text{obs} = [n_{3jt}^\text{obs}, n_{4jt}^\text{obs}, n_{5jt}^\text{obs}] ^\top\) are assumed to follow a multinomial observation likelihood with expected proportions given by the true state:
\[ \mathbf{n}_{ajt}^\text{obs} \sim \text{Multinomial} \left( \sum_{a=3}^5 n_{ajt}^\text{obs}, \mathbf{q}_{jt} \right). \]
Frequencies of wild- and hatchery-origin spawners are estimated based on otolith marking or genetic parentage-based tagging (PBT), and are assumed to follow a binomial observation likelihood given the total sample size assigned to rearing origin:
\[ n_{\text{H},jt}^\text{obs} \sim \text{Bin} \left( n_{\text{W},jt}^\text{obs} + n_{\text{H},jt}^\text{obs}, p_{\text{HOS},jt} \right). \]
The observed frequency of female spawners is modeled with a binomial observation likelihood given the total sample size assigned to sex:
\[ n_{\text{F},jt}^\text{obs} \sim \text{Bin} \left( n_{\text{M},jt}^\text{obs} + n_{\text{F},jt}^\text{obs}, q_{\text{F},jt} \right). \]
To complete the model specification, we need priors on all top-level or hyperparameters as well as initial states that are not generated by the process model. In the list of parameter priors below, the power-exponential (also known as generalized normal or Subbotin) density
\[ \text{PowerExp}(x | u,s,r) = \dfrac{r}{2 s \Gamma(1/r)} \text{exp} \left[ - \left( \dfrac{|x - u|}{s} \right)^r \right] \]
is used to regularize autocorrelation coefficients away from the computationally problematic boundaries 1 and -1. The Lewandowski, Kurowicka and Joe (Lewandowski et al. 2009) distribution with density \(\text{LKJ}(\eta)\), is a distribution over correlation matrices; when \(\eta = 1\) it is uniform. The prior mean and SD of log-mean smolt capacity, \(m_{M_\text{max}}\) and \(s_{M_\text{max}}\), are the sample maximum and SD of observed log smolt density, respectively. While this data-aware prior technically uses the data twice, it is broad and weakly informative and merely serves to find a reasonable scale for overall population size, similar to widely used Bayesian inference packages.
\[ \begin{aligned} \mu_{E,a} &\sim N_+(2500,500) \\ \sigma_{E,a} &\sim N_+(500,1000) \\ \delta_{\text{NG}} &\sim U(0,1) \\ \mu_\psi &\sim U(0,1) \\ \sigma_\psi &\sim N_+(0,1) \\ \mu_{M_\text{max}} &\sim N(m_{M_\text{max}}, s_{M_\text{max}}) \\ \sigma_{M_\text{max}} &\sim N_+(0,3) \\ \rho_M &\sim \text{PowerExp}(0,0.85,20) \\ \sigma^\text{year}_M &\sim N_+(0,2) \\ \sigma_M &\sim N_+(0,2) \\ \mu_{MS} &\sim U(0,1) \\ \rho_{MS} &\sim \text{PowerExp}(0,0.85,20) \\ \sigma^\text{year}_{MS} &\sim N_+(0,2) \\ \sigma_{MS} &\sim N_+(0,2) \\ \boldsymbol{\sigma}^\text{pop}_\mathbf{p} &\sim N_+(0,2) \\ \boldsymbol{\sigma}_\mathbf{p} &\sim N_+(0,2) \\ \mathbf{R}_\mathbf{p}^\text{pop} &\sim \text{LKJ}(1) \\ \mathbf{R}_\mathbf{p} &\sim \text{LKJ}(1) \\ \mu_\text{F} &\sim U(0,1) \\ \sigma^\text{pop}_\text{F} &\sim N_+(0,2) \\ \sigma_\text{F} &\sim N_+(0,2) \\ \mu_{\tau_M} &\sim N(0,1) \\ \sigma_{\tau_M} &\sim N_+(0,5) \\ \mu_{\tau_S} &\sim N(0,1) \\ \sigma_{\tau_S} &\sim N_+(0,5) \end{aligned} \]
Finally, lognormal priors on initial smolt abundance (in year 1 of each population time series) and spawner abundance (in years 1-3) are also data-aware but weakly informative, with parameters equal to the sample log-mean and 2 times the sample log-SD of observed abundance, respectively. The prior on spawner age composition in years 1-3 is simplex-uniform over the space of “orphan” age classes, i.e. those too old to have parent brood years included in the process model. The proportion of females in the same orphan age classes is given a \(\text{Beta}(3,3)\) prior to weakly regularize toward an equal sex ratio.
User inputs needed to fit the LCR chum IPM include, first, a data
frame containing observations and associated information organized by
population and year. We can look at the subset of rows of
fish_data_SMS for the Crazy Johnson Creek population (coded
Grays CJ) to see what column names and formats
salmonIPM expects:
| pop | year | A | S_obs | tau_S_obs | M_obs | tau_M_obs | downstream_trap | n_age3_obs | n_age4_obs | n_age5_obs | n_H_obs | n_W_obs | n_M_obs | n_F_obs | p_G_obs | fit_p_HOS | B_take_obs | S_add_obs | F_rate |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Grays CJ | 2004 | 643.6 | 1051 | 0.16 | NA | NA | 191 | 4 | 26 | 1 | 0 | 31 | 22 | 9 | 1 | 1 | 0 | 0 | 0 |
| Grays CJ | 2005 | 643.6 | 1418 | 0.03 | NA | NA | 192 | 56 | 101 | 40 | 12 | 197 | 85 | 124 | 1 | 1 | 0 | 0 | 0 |
| Grays CJ | 2006 | 643.6 | 3819 | 0.05 | NA | NA | 193 | 65 | 250 | 9 | 13 | 324 | 176 | 161 | 1 | 1 | 0 | 0 | 0 |
| Grays CJ | 2007 | 643.6 | 870 | 0.13 | NA | NA | 194 | 12 | 77 | 14 | 4 | 103 | 55 | 52 | 1 | 1 | 0 | 0 | 0 |
| Grays CJ | 2008 | 643.6 | 1093 | 0.13 | NA | NA | 195 | 45 | 78 | 4 | 13 | 127 | 57 | 83 | 1 | 1 | 0 | 0 | 0 |
| Grays CJ | 2009 | 643.6 | 996 | 0.03 | NA | NA | 196 | 74 | 26 | 2 | 3 | 102 | 47 | 58 | 1 | 1 | 0 | 0 | 0 |
| Grays CJ | 2010 | 643.6 | 865 | 0.13 | NA | NA | 197 | 11 | 65 | 0 | 2 | 76 | 39 | 39 | 1 | 1 | 0 | 0 | 0 |
| Grays CJ | 2011 | 643.6 | 2304 | 0.12 | NA | NA | 198 | 11 | 184 | 4 | 16 | 199 | 110 | 105 | 1 | 1 | 0 | 0 | 0 |
| Grays CJ | 2012 | 643.6 | 3475 | 0.09 | 576450 | 0.24 | 199 | 21 | 184 | 34 | 8 | 239 | 177 | 70 | 1 | 1 | 0 | 0 | 0 |
| Grays CJ | 2013 | 643.6 | 1925 | 0.06 | 1466141 | 0.04 | 200 | 112 | 106 | 25 | 19 | 244 | 148 | 115 | 1 | 1 | 0 | 0 | 0 |
| Grays CJ | 2014 | 643.6 | 1541 | 0.04 | 1101634 | 0.10 | 201 | 37 | 165 | 11 | 25 | 214 | 151 | 88 | 1 | 1 | 0 | 0 | 0 |
| Grays CJ | 2015 | 643.6 | 4193 | 0.08 | 419369 | 0.14 | 202 | 114 | 170 | 14 | 26 | 298 | 165 | 159 | 1 | 1 | 0 | 0 | 0 |
| Grays CJ | 2016 | 643.6 | 5987 | 0.06 | 1155179 | 0.06 | 203 | 31 | 236 | 26 | 10 | 294 | 165 | 139 | 1 | 1 | 0 | 0 | 0 |
| Grays CJ | 2017 | 965.4 | 3681 | 0.15 | 544785 | 0.07 | 204 | 96 | 195 | 79 | 25 | 371 | 206 | 190 | 1 | 1 | 0 | 0 | 0 |
| Grays CJ | 2018 | 965.4 | 899 | 0.02 | 1259033 | 0.07 | 205 | 122 | 46 | 4 | 8 | 172 | 102 | 78 | 1 | 1 | 0 | 0 | 0 |
| Grays CJ | 2019 | 965.4 | 72 | 1.17 | 276500 | 0.36 | 206 | 3 | 11 | 0 | 0 | 14 | 10 | 4 | 1 | 1 | 0 | 0 | 0 |
| Grays CJ | 2020 | 965.4 | 2863 | 0.10 | 8189 | 0.10 | 207 | 157 | 55 | 3 | 5 | 215 | 128 | 92 | 1 | 1 | 0 | 0 | 0 |
| Grays CJ | 2021 | 965.4 | NA | NA | 1236534 | 0.06 | 208 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 |
Column names generally follow the mathematical notation;
fit_p_HOS is a logical or numeric variable indicating
whether \(p_{\text{HOS},jt}\) should be
estimated or set to zero, and downstream_trap gives the row
index corresponding to the downstream smolt trap that captures
outmigrants from the given pop and year. (In
this case, Grays CJ smolts are also counted at a local
trap.)
Second, fecundity_data contains observations of
individual fecundity indexed by hatchery female. Here are the first few
rows (year and ID are not actually used in
modeling):
| year | ID | age_E | E_obs |
|---|---|---|---|
| 2009 | F-1 | 3 | 2322 |
| 2009 | F-6 | 3 | 2481 |
| 2009 | F-13 | 3 | 2930 |
| 2009 | F-19 | 3 | 2817 |
| 2009 | F-20 | 3 | 2905 |
| 2009 | F-22 | 3 | 1803 |
| 2009 | F-29 | 3 | 2284 |
| 2009 | F-32 | 3 | 2586 |
| 2009 | F-35 | 3 | 1855 |
| 2009 | F-42 | 3 | 2533 |
We can now call the function salmonIPM() to fit models
to the historical LCR chum monitoring data. Here we use the Ricker
spawner-recruit function. This model takes approximately 4 hr to fit on
a multicore 3-GHz Intel processor.
LCRchum_Ricker <- salmonIPM(fish_data = fish_data_SMS, fecundity_data = fecundity_data,
ages = list(M = 1), stan_model = "IPM_LCRchum_pp", SR_fun = "Ricker",
log_lik = TRUE, chains = 4, iter = 1500, warmup = 500,
control = list(adapt_delta = 0.99, max_treedepth = 14))
print(LCRchum_Ricker, prob = c(0.05,0.5,0.95),
pars = c("psi","Mmax","eta_year_M","eta_year_MS","eta_pop_p","mu_pop_alr_p","p","p_F",
"tau_M","tau_S","p_HOS","B_rate","E_hat","M","S","s_MS","q","q_F","LL"),
include = FALSE, use_cache = FALSE)
Inference for Stan model: IPM_LCRchum_pp.
4 chains, each with iter=1500; warmup=500; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 5% 50% 95% n_eff Rhat
mu_E[1] 2594.27 0.56 44.59 2519.98 2594.23 2668.24 6360 1.00
mu_E[2] 2857.26 0.25 23.76 2818.05 2856.92 2896.33 8914 1.00
mu_E[3] 2866.31 0.95 72.83 2745.97 2866.69 2984.15 5821 1.00
sigma_E[1] 511.00 0.41 33.01 461.47 509.49 567.92 6341 1.00
sigma_E[2] 560.31 0.23 17.08 533.51 559.85 589.65 5655 1.00
sigma_E[3] 436.92 0.78 54.68 358.03 430.64 532.87 4878 1.00
delta_NG 0.64 0.00 0.22 0.25 0.67 0.96 2354 1.00
mu_psi 0.62 0.00 0.08 0.49 0.61 0.76 827 1.00
sigma_psi 0.47 0.01 0.30 0.07 0.43 1.02 1055 1.00
mu_Mmax 7.15 0.02 0.52 6.37 7.12 8.06 854 1.01
sigma_Mmax 1.30 0.01 0.41 0.78 1.22 2.08 1474 1.00
rho_M -0.01 0.01 0.31 -0.52 -0.02 0.54 1128 1.00
sigma_year_M 0.46 0.00 0.11 0.30 0.45 0.66 1252 1.00
sigma_M 0.31 0.00 0.05 0.24 0.31 0.39 913 1.00
mu_MS 0.00 0.00 0.00 0.00 0.00 0.00 1708 1.00
rho_MS 0.51 0.01 0.22 0.09 0.55 0.80 691 1.01
sigma_year_MS 1.08 0.01 0.22 0.78 1.05 1.48 1270 1.00
sigma_MS 0.51 0.00 0.05 0.44 0.51 0.60 1086 1.00
mu_p[1] 0.24 0.00 0.02 0.21 0.24 0.28 695 1.00
mu_p[2] 0.72 0.00 0.02 0.69 0.72 0.75 846 1.00
mu_p[3] 0.04 0.00 0.00 0.03 0.04 0.05 616 1.00
sigma_pop_p[1] 0.17 0.01 0.14 0.01 0.13 0.44 534 1.00
sigma_pop_p[2] 0.11 0.00 0.10 0.01 0.09 0.30 557 1.00
R_pop_p[1,1] 1.00 NaN 0.00 1.00 1.00 1.00 NaN NaN
R_pop_p[1,2] 0.33 0.02 0.57 -0.78 0.50 0.97 1078 1.00
R_pop_p[2,1] 0.33 0.02 0.57 -0.78 0.50 0.97 1078 1.00
R_pop_p[2,2] 1.00 0.00 0.00 1.00 1.00 1.00 4294 1.00
sigma_p[1] 1.69 0.00 0.13 1.49 1.68 1.91 739 1.01
sigma_p[2] 0.91 0.00 0.09 0.77 0.91 1.07 815 1.01
R_p[1,1] 1.00 NaN 0.00 1.00 1.00 1.00 NaN NaN
R_p[1,2] 0.78 0.00 0.05 0.69 0.79 0.86 979 1.00
R_p[2,1] 0.78 0.00 0.05 0.69 0.79 0.86 979 1.00
R_p[2,2] 1.00 0.00 0.00 1.00 1.00 1.00 3902 1.00
mu_F 0.50 0.00 0.02 0.47 0.50 0.53 1134 1.00
sigma_pop_F 0.20 0.00 0.07 0.11 0.19 0.33 1136 1.00
sigma_F 0.36 0.00 0.04 0.30 0.36 0.43 1573 1.00
mu_tau_M 0.07 0.00 0.01 0.06 0.07 0.09 4614 1.00
sigma_tau_M 1.09 0.00 0.11 0.94 1.08 1.29 4708 1.00
mu_tau_S 0.11 0.00 0.01 0.10 0.11 0.13 4136 1.00
sigma_tau_S 0.99 0.00 0.05 0.91 0.99 1.09 4292 1.00
lp__ -42311.37 1.12 37.86 -42375.79 -42310.64 -42249.69 1147 1.00
Samples were drawn using NUTS(diag_e) at Mon Jun 13 16:35:48 2022.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
The results (showing only hyperparameters for readability) suggest
that the Markov chains mixed adequately, since the Gelman-Rubin \(\widehat{R}\) diagnostics are all ≤ 1.01
and effective sample sizes \(n_\text{eff}\) are sufficient for computing
90% credible intervals. We may get false-positive warnings that some
\(\widehat{R}\) and \(n_\text{eff}\) estimates are
NaN, but these are for fixed diagonal elements of
correlation matrices. We could further explore the chains and HMC
diagnostics using visualizations in the shinystan
package.
In principle we could fit all three spawner-recruit functions and compare them based on estimated out-of-sample predictive performance using the approximate leave-one-out cross-validation score (LOO; Vehtari et al. 2017). In practice we find that LOO estimates for IPMs fitted to the LCR chum data are too unstable to be usable (excessive Pareto k-values indicating that some observations are highly influential on the posterior), requiring brute-force cross-validation or customized methods. These are under development, but in the meantime we proceed to use the Ricker model for inference since it is biologically plausible and fits the data reasonably well.
The plot below summarizes posterior distributions of the main hyperparameters and states across the life cycle. In panels C-H, thick lines and shading correspond to the hyper-mean or ESU level and thin lines correspond to each of the 12 populations in the data set. In panels E-H, lines indicate posterior medians and shading indicates 90% posterior credible intervals at the ESU level (uncertainty around population-level curves is omitted for clarity).
There is clear evidence of density-dependence, with more among-population heterogeneity in smolt capacity (millions of smolts per km of spawning habitat) than in maximum egg-to-smolt survival, but most populations exhibit little if any overcompensation over the range of observed spawner densities (shown by the x-axis rug in E-F). Both freshwater productivity (total smolt process error, G) and SAR are strongly synchronous at the ESU scale. Freshwater productivity is relatively stable except for the sharp drop in 2019. Smolt-to-adult survival, by contrast, shows strong and quasiperiodic fluctuations consistent with large-scale oceanographic or other environmental drivers. Because of this large and as-yet-unexplained process stochasticity, the uncertainty around the final 4 years of SAR estimates (corresponding to cohorts that have not yet fully returned as adults, and thus informed more by the process model than by data) is very wide.
We can look more closely at freshwater productivity by plotting the spawner-recruit functions for each population along with the observations and states. Here the lines and dark inner shading indicate the posterior median and 90% credible interval of the fitted Ricker curve, and the light outer shading represents total (shared plus unique) process error. (Due to the high reported precision of the observations, observation error makes only a minor contribution to the posterior predictive distribution and is not shown.) The black points are the spawner and smolt data for the five populations with smolt traps. Note that the trap counts for Grays MS represent the sum of production in the mainstem and in the upstream Grays CJ and Grays WF populations, as discussed above, and thus the observations in that case are not directly comparable to the states and fitted values. Open gray points are state estimates, with error bars (in some cases truncated for clarity) indicating 90% credible intervals.
The data are sparse, but overall the Ricker appears to be an adequate fit. The states with noticeably higher uncertainty generally correspond to cases with missing spawner observations and/or missing observation error estimates, as we will see in the time series plots below.
We now examine model fits to individual data components in more detail.
Fecundity was higher for 4- and 5-year old females than for the youngest age class. Here the histograms show observations and the lines and shading show the posterior median and 95% credible interval, respectively, of the estimated zero-truncated normal density, which fits the data reasonably well.
The next plot shows observed (observation-model posterior medians; points) and estimated spawner abundance for each population. Filled points indicate observations whose error SD is known (90% intervals are sometimes too narrow to be visible), while the SD for open points is imputed. The solid gray line is the posterior median of the true state from the IPM. Posterior 90% credible intervals indicate process (dark shading) and observation (light shading) uncertainty, i.e. the posterior predictive distribution (PPD) of hypothetical replicate data under the model. Leading or trailing years without spawner abundance estimates are included if smolt abundance is available.
Fits to the smolt abundance data, following the same conventions as
for spawner abundance, immediately show how much sparser the juvenile
sampling is. Only six populations are monitored with smolt traps
(including Grays WF, which along with Grays CJ
is incorporated into the summed smolt observations and estimates for the
downstream Grays MS trap) and several of those time series
are incomplete. For that reason, a greater share of the information
about the underlying state comes from the process model, which induces
synchronous fluctuations.
To understand how the IPM imputes the abundance observation error SD in cases where it is not reported, we can plot the lognormal hyperdistribution (posterior median and 90% credible interval) fitted to the known SD values (histogram).
Next we compare the estimated true spawner age-frequencies (posterior median and 90% credible interval) to the observed sample proportions (with 90% binomial confidence interval). Age composition varies considerably across populations and through time, tracking fluctuations in cohort strength. The precision of the estimated states follows the sample size, with smaller samples giving relatively more weight to the hierarchical process model.
Plots of the estimated adult sex ratio, following a similar format, likewise show strong interannual fluctuations and some systematic deviations from a 1:1 sex ratio across populations.
Proportion of hatchery-origin spawners was generally low, with some very small samples that cannot definitively rule out higher \(p_\text{HOS}\). In terminal years when spawner data are not yet available but hatcheries are known to be operating in the population, the estimates revert back to the independent \(U(0,1)\) prior. This is not a problem, but will change if we are able to expand the process model to include hatchery releases and returns, which would generate predictions of \(p_{\text{HOS},jt}\) (see Next Steps).
It is straightforward to use the IPM to forecast future population
dynamics, given the observed historical data. We can simply pad
fish_data_SMS with empty rows corresponding to future years
in each population and re-fit the model. Estimated states will then
include the future years, just as they would any past years with missing
data. This seamless integration of retrospective fitting and forward
projection is a major advantage of the IPM approach; there is no need to
separately specify initial conditions or devise ad hoc methods for
simulating parameter uncertainty or process noise.
In this example, we forecast adult returns in 2022 given the data available through 2021, albeit not for all variables in all populations (in particular, as seen in the plot below, 2021 adults were not available at the time of writing). Although the posterior samples include all the states in the model – for example, we could forecast smolt abundance or adult age structure – we focus here on total wild-origin spawners of all ages.
This plot is identical to the retrospective fits to spawner abundance except for the added final year. The forecast intervals are much wider than the credible intervals around the retrospective estimates; this is expected in general, of course, but is especially pronounced because of the large process-error fluctuations in SAR. As this logit-AR(1) process is simulated forward in time, the marine survival rates and associated population dynamics will rapidly blow up, converging to a wide stationary distribution. Within a single chum generation, however, the forecast is constrained by observations of the abundance of parent cohorts and outmigrants as well as younger age-classes whose older siblings will return in 2022. Thus the IPM automatically incorporates the information used in “sibling regression” models (Peterman 1982), but without the restrictive assumption of constant within-cohort age structure.
The table below gives a more detailed look at the projections (posterior medians and 90% forecast intervals) of wild spawner abundance in 2022.
| Population | Estimate |
|---|---|
| Hamilton Channel | 1522 (277, 12651) |
| Hamilton Creek | 1086 (189, 9619) |
| Ives | 2743 (376, 26409) |
| Hardy Creek | 340 (47, 3519) |
| Duncan Channel | 161 (28, 1288) |
| Horsetail | 470 (66, 4644) |
| St Cloud | 270 (40, 2699) |
| Multnomah | 680 (102, 5800) |
| I-205 | 3732 (647, 32237) |
| Grays CJ | 1442 (284, 9717) |
| Grays MS | 6071 (985, 53918) |
| Grays WF | 7314 (1273, 59702) |
| Total | 29466 (6554, 207984) |
As shown in the figures above, the LCR chum IPM currently generates estimates (or forecasts) of a number of management-relevant quantities and reporting metrics:
We anticipate that future model development will add to this list, particularly for metrics related to hatchery program effectiveness (e.g., SAR) and contribution to wild populations (e.g., program-specific \(p_{\text{HOS}}\)).
The LCR chum IPM remains under active development, with an ultimate goal of providing decision support for adaptive management of habitat restoration, hatchery supplementation, and monitoring design. Toward those ends, our highest priorities for further model development in the near and medium term include the following:
Habitat and environmental drivers. Incorporate covariates (e.g., freshwater habitat quality, oceanographic factors) into the IPM to estimate effects on stage-specific productivity, capacity, and/or survival. For example, strong leading indicators of SAR could reduce the unexplained process noise and thus help to constrain near-term forecasts.
Hatchery operations. Expand the IPM to model hatchery fry / smolt releases and adult returns, replacing the current unstructured estimates of \(p_\textrm{HOS}\) with a process-based description of naturally reproducing hatchery-origin spawners (HOR). This will allow jointly modeling wild and hatchery SAR, which may help to inform estimates of common trends and environmental covariate effects. A longer-term goal is to explicitly model hatchery operations, i.e., broodstock transfers into hatchery or quasi-hatchery populations and fry production.
Straying and spatial connectivity. In tandem with modeling hatchery releases and returns, we plan to model dispersal among populations using a straying matrix estimated using origin data from hatcheries and spawning channels. This could potentially affect inferences about metapopulation viability and allow scenario analyses related to recolonization.
Incorporate data from less-intensively monitored Lower Columbia chum populations not currently included in our data set, such as the Elochoman and Skamokawa. As the number of populations is relatively small for fitting a complex hierarchical life cycle model, any additional data could improve the estimates for the entire ESU and for individual populations by sharing information.
Scenario analyses. Use the parameterized IPM to evaluate life-cycle bottlenecks (sensitivity to specific vital rates) and to ask questions about monitoring design (e.g., data gaps and value of specific data types relative to their cost), prioritization of habitat restoration, effectiveness of supplementation, and potential rates of recolonization.
In FY2021 we focused on objectives (2) and (3) in the list above. We developed a model component describing the survival of hatchery chum from smolt release to adult return, with SAR drawn from the same hyperdistribution as the wild populations (see Model Structure: Smolt-to-Adult Survival), though potentially with a dummy covariate to allow differential marine survival by rearing type. Returning hatchery-origin adults are then distributed across natural populations using a dispersal matrix \(\mathbf{p}_\textrm{origin}\), assumed to be time-invariant given the relatively sparse recovery data we have available. As a rough visualization of this straying matrix, the plot below shows estimates of the proportion of adults from each identifiable origin (hatchery or spawning channel) that return to each natural population. The bars within each source sum to 1. These crude estimates are based on expanding known-origin composition samples by the point estimates of observed spawners in each year and then summing across years; they give no indication of the observation error or sample size involved. The IPM is not fitted to these derived estimates, but rather the underlying raw origin-composition data is a multinomial likelihood component.
This hatchery smolt-to-adult survival and straying component has been implemented in the Stan code, but we have been unsuccessful so far in fitting the expanded model because initial parameter values lead to a negative infinite log-posterior. It appears that more work is needed to fine-tune the values used to initialize the HMC chains so that sampling can proceed. We anticipate that these challenges can be overcome, and the resulting model will provide estimates of hatchery SAR and spatial dispersal. It will also enable us to ask questions about the effects of alternative hatchery release levels on natural population trends and viability, and serve as the first step toward a process-based model of the full hatchery life cycle (broodstock to adult return).